game score
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi
Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.
In Pursuit of Predictive Models of Human Preferences Toward AI Teammates
Siu, Ho Chit, Peña, Jaime D., Zhou, Yutai, Allen, Ross E.
We seek measurable properties of AI agents that make them better or worse teammates from the subjective perspective of human collaborators. Our experiments use the cooperative card game Hanabi -- a common benchmark for AI-teaming research. We first evaluate AI agents on a set of objective metrics based on task performance, information theory, and game theory, which are measurable without human interaction. Next, we evaluate subjective human preferences toward AI teammates in a large-scale (N=241) human-AI teaming experiment. Finally, we correlate the AI-only objective metrics with the human subjective preferences. Our results refute common assumptions from prior literature on reinforcement learning, revealing new correlations between AI behaviors and human preferences. We find that the final game score a human-AI team achieves is less predictive of human preferences than esoteric measures of AI action diversity, strategic dominance, and ability to team with other AI. In the future, these correlations may help shape reward functions for training human-collaborative AI.
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi
Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate.
A Minimal Approach for Natural Language Action Space in Text-based Games
Ryu, Dongwon Kelvin, Fang, Meng, Pan, Shirui, Haffari, Gholamreza, Shareghi, Ehsan
Text-based games (TGs) are language-based interactive environments for reinforcement learning. While language models (LMs) and knowledge graphs (KGs) are commonly used for handling large action space in TGs, it is unclear whether these techniques are necessary or overused. In this paper, we revisit the challenge of exploring the action space in TGs and propose $ \epsilon$-admissible exploration, a minimal approach of utilizing admissible actions, for training phase. Additionally, we present a text-based actor-critic (TAC) agent that produces textual commands for game, solely from game observations, without requiring any KG or LM. Our method, on average across 10 games from Jericho, outperforms strong baselines and state-of-the-art agents that use LM and KG. Our approach highlights that a much lighter model design, with a fresh perspective on utilizing the information within the environments, suffices for an effective exploration of exponentially large action spaces.
Can You Improve My Code? Optimizing Programs with Local Search
Abdollahi, Fatemeh, Ameen, Saqib, Taylor, Matthew E., Lelis, Levi H. S.
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program's performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants' programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.
Development of a Trust-Aware User Simulator for Statistical Proactive Dialog Modeling in Human-AI Teams
Kraus, Matthias, Riekenbrauck, Ron, Minker, Wolfgang
HAIT requires close coordination between humans and AI teammates to work together towards a common goal [40]. Effective communication, prediction of teammates' actions, and high-level coordination are essential components of this collaborative effort. In this regard, the proactive behavior of AI-based systems and the communication thereof during collaboration is an important research topic concerning HAITs, e.g., see Horvitz et al. [8]. Proactivity can be defined as an AI's self-initiating, anticipatory behavior for contributing to effective and efficient task completion. It has been shown to be essential for human teamwork as it leads to higher job and team performance and is associated with leadership and innovation [3]. However, the design of adequate proactivity for AI-based systems to support humans is still an open question and a challenging topic. It is essential to study the impact of proactive system actions on the human-agent trust relationship and how to use information about an AI agent's perceived trustworthiness to model appropriate proactive dialog strategies for forming effective HAITs.
Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula
Yang, Boling, Zheng, Liyuan, Ratliff, Lillian J., Boots, Byron, Smith, Joshua R.
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the generation of sophisticated policies challenging. Indeed, the asymmetry in the environment may implicitly or explicitly provide an advantage to a subset of agents which could, in turn, lead to a low-quality equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a two-player MARL problem as a Stackelberg game with one player as the `leader' and the other as the `follower' in a hierarchical interaction structure wherein the leader has an advantage. We first demonstrate that the leader's advantage from ST-MADDPG can be used to alleviate the inherent asymmetry in the environment. By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
Motivating Physical Activity via Competitive Human-Robot Interaction
Yang, Boling, Habibi, Golnaz, Lancaster, Patrick E., Boots, Byron, Smith, Joshua R.
Competition is ubiquitous in the natural world [1, 2] and in human society [3, 4, 5]. Despite its universality, competitive interaction has rarely been investigated in the field of Human Robot Interaction, which has mainly focused on cooperative interactions such as collaborative manipulation, mobility assistance, feeding, and so on [6, 7, 8, 9, 10]. In some ways it is not surprising that competitive interaction has been overlooked: of course everyone wants a robot that can assist them; who would want a robot that thwarts their intentions? Yet, we also accept that human-human competition can be healthy and productive, for example in structured contexts such as sports. In this paper we explore the idea that human-robot competition can provide similar benefits. We believe that physical exercise settings such as athletic practice, fitness training, and physical therapy are scenarios in which competitive HRI can benefit users.
A non-cooperative meta-modeling game for automated third-party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks
Wang, Kun, Sun, WaiChing, Du, Qiang
The evaluation of constitutive models, especially for high-risk and high-regret engineering applications, requires efficient and rigorous third-party calibration, validation and falsification. While there are numerous efforts to develop paradigms and standard procedures to validate models, difficulties may arise due to the sequential, manual and often biased nature of the commonly adopted calibration and validation processes, thus slowing down data collections, hampering the progress towards discovering new physics, increasing expenses and possibly leading to misinterpretations of the credibility and application ranges of proposed models. This work attempts to introduce concepts from game theory and machine learning techniques to overcome many of these existing difficulties. We introduce an automated meta-modeling game where two competing AI agents systematically generate experimental data to calibrate a given constitutive model and to explore its weakness, in order to improve experiment design and model robustness through competition. The two agents automatically search for the Nash equilibrium of the meta-modeling game in an adversarial reinforcement learning framework without human intervention. By capturing all possible design options of the laboratory experiments into a single decision tree, we recast the design of experiments as a game of combinatorial moves that can be resolved through deep reinforcement learning by the two competing players. Our adversarial framework emulates idealized scientific collaborations and competitions among researchers to achieve a better understanding of the application range of the learned material laws and prevent misinterpretations caused by conventional AI-based third-party validation.
'Mozart would have made video game music': composer Eímear Noone on a winning art form
Eímear Noone got into composing and conducting video game music by accident. One day, while studying music at Trinity College Dublin, a fourth-year student came to the bar she was drinking in with members of the college chapel choir and offered them a few quid to help with the orchestration on a project of his. "I have a vivid memory of sitting on a studio floor somewhere in Dublin writing choral parts with my pals and then singing them," she says. "Six months later my brother calls me in a complete tizzy and says, 'Did you work on Metal Gear Solid?' I was like, 'No!' He says, 'Well, I'm looking at your name on the screen credits right now.' And sure enough, the session she had contributed to for beer money was the soundtrack to Hideo Kojima's blockbusting adventure game. "Years later I was at the Bird's Nest in Beijing at the Olympic Stadium conducting this very piece of music," she says. Noone is now a hugely successful film and video game composer, having contributed scores for directors such as Gus Van Sant and Joe Dante, and for games, World of Warcraft, Diablo III and Hearthstone. In November, she's presenting her second series of High Score, Classic FM's agenda-setting programme dedicated to game music. Underappreciated outside of game fandom for years, the genre is now huge business with dedicated orchestras playing sold-out global concert tours. And Noone is a passionate advocate – very keen to explore and explain the unique elements of the art form. There is, of course, a foundational similarity between game and film scores – they are both composed to accompany and accentuate screened action. But while a film score needs to accompany a two-hour linear experience with specific cues and events, video game music must be there for many hours of play. Most open-word action adventures, the likes of Assassin's Creed Origins, Witcher 3 and Final Fantasy XV, offer over 100 hours of narrative, but many players will spend much longer exploring. Music scores also have two different roles in games: they accompany the non-interactive cinematic sequences that set up the story and occur throughout a game – sort of like short animated movie sequences; and they provide background music while you play. "Cinematic are scored very similarly to a movie or an animated film.